Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mrtrix3.run: Improved shebang parsing #2958

Merged
merged 5 commits into from
Aug 15, 2024
Merged

mrtrix3.run: Improved shebang parsing #2958

merged 5 commits into from
Aug 15, 2024

Conversation

Lestropie
Copy link
Member

@Lestropie Lestropie commented Aug 12, 2024

In doing #2957 I noticed an issue whereby whenever I manually invoked my compiled-from-source Python 3.12, any Python scripts that themselves invoke Python scripts would revert to Python 3.10 that was in my PATH. This is contrary to previously constructed code whose intent was to ensure that if a Python script invoked another Python script, the same interpreter would be used (#1828). This occurred because on dev, in the course of dropping Python2 support on dev (#2215), the script shebangs were changed to /usr/bin/python3 rather than /usr/bin/env python3---note that this persists through #2741---but the code for explicit shebang parsing was not tested on such.

I note that a reversion back to using /usr/bin/env seems to be the most appropriate course of action from discussion with @daljit46. But this shebang parsing code is not just for MRtrix3 executables, it tries to choose the most suitable interpreter for any shebang.

This change makes the relevant function more robust to both shebang styles, and enhances the version matching logic. If a script specifies a specific minor Python version (or indeed even patch version), the current interpreter will only be used if it is directly compatible with such; otherwise the shell will get to choose (including if the shell itself invokes env).

Tough to integrate the following as unit tests given that it depends on the contents of PATH. So I'm just doing a dump here. For context, the system Python version is 3.10, but I'm explicitly running a 3.12.4 from outside of PATH; this best shows the selection of when the current interpreter is run vs. other behaviour. Bold indicates cases where behaviour is different on Windows.

  • Test behaviour on Windows.
    In particular, the API supersedes env, because this was requisite at some point in the past to allow Python scripts to execute other Python scripts on Windows. But this may have been an MSYS2 problem that has since been rectified.

  • Discuss any contentious scenarios.
    Possible that @daljit46 may disagree with some of these, preferring to allow the system to decide what to use in all scenarios. However the case that made me pursue this (Python API: Support Python 3.12 #2957) was specifically about my using a non-system version of Python yet having subsequent scripts then executing on the system Python. So my current intuition is to preserve this behaviour. The other one that might be contentious is what to do if just "python" is specified as the destination interpreter. Here I'm invoking the current interpreter in that case.

Shebang contents POSIX parsed / error Windows Parsed / error
b'Not a shebang' No shebang found No shebang found
b'\n#!/newline/before shebang' ['/newline/before', 'shebang'] ['/newline/before', 'shebang']
b'#!/usr/bin/env' Invalid shebang: missing interpreter after "env" Invalid shebang: missing interpreter after "env"
b'#!/usr/bin/env python' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
b'#!/usr/bin/env python2' ['/usr/bin/env', 'python2'] ['/usr/bin/python2']
b'#!/usr/bin/env python3' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
b'#!/usr/bin/env python3.10' ['/usr/bin/env', 'python3.10'] ['/usr/bin/python3.10']
b'#!/usr/bin/env python3.10 extra' ['/usr/bin/env', 'python3.10', 'extra'] ['/usr/bin/python3.10', 'extra']
b'#!/usr/bin/env python3.11' ['/usr/bin/env', 'python3.11'] Invalid shebang: "env" executing on Windows, but unable to find command "python3.11"
b'#!/usr/bin/env python3.12' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
b'#!/usr/bin/env python3.12 extra' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3', 'extra'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3', 'extra']
b'#!/usr/bin/env python3.10.1' ['/usr/bin/env', 'python3.10.1'] Invalid shebang: "env" executing on Windows, but unable to find command "python3.10.1"
b'#!/usr/bin/env python3.12.3' ['/usr/bin/env', 'python3.12.3'] Invalid shebang: "env" executing on Windows, but unable to find command "python3.12.3"
b'#!/usr/bin/env python3.12.4' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
b'#!/usr/bin/env python3.12.5' ['/usr/bin/env', 'python3.12.5'] Invalid shebang: "env" executing on Windows, but unable to find command "python3.12.5"
b'#!/usr/bin/env python3.x' Invalid shebang: unable to extract Python version from text "#!/usr/bin/env python3.x" Invalid shebang: unable to extract Python version from text "#!/usr/bin/env python3.x"
b'#!/usr/bin/python' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
b'#!/usr/bin/python2' ['/usr/bin/python2'] ['/usr/bin/python2']
b'#!/usr/bin/python3' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
b'#!/usr/bin/python3.10' ['/usr/bin/python3.10'] ['/usr/bin/python3.10']
b'#!/usr/bin/python3.10 extra' ['/usr/bin/python3.10', 'extra'] ['/usr/bin/python3.10', 'extra']
b'#!/usr/bin/python3.11' [] []
b'#!/usr/bin/python3.12' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
b'#!/usr/bin/python3.12 extra' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3', 'extra'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3', 'extra']
b'#!/usr/bin/python3.10.1' [] []
b'#!/usr/bin/python3.12.3' [] []
b'#!/usr/bin/python3.12.4' ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3'] ['/home/unimelb.edu.au/robertes/src/Python3.12/bin/python3']
`b'#!/usr/bin/python3.12.5'\ [] []
b'#!/usr/bin/blah' ['/usr/bin/blah'] ['/usr/bin/blah']
b'#!/usr/bin/foo bar' ['/usr/bin/foo', 'bar'] ['/usr/bin/foo', 'bar']
b'#!' No shebang found No shebang found
b'#! /gap/in/shebang' ['/gap/in/shebang'] ['/gap/in/shebang']

@Lestropie Lestropie self-assigned this Aug 12, 2024

This comment was marked as outdated.

This comment was marked as outdated.

Copy link

clang-tidy review says "All clean, LGTM! 👍"

@Lestropie Lestropie marked this pull request as ready for review August 14, 2024 04:10
@Lestropie
Copy link
Member Author

Took a quick look on MSYS2. The /usr/bin/env workaround (running a shutil.which() on the specified interpreter) is still necessary on Windows; removing it results in "The system cannot find the file specified".

@daljit46 Do you want to make a case against the pre-existing behaviour where we ensure that executed Python scripts use the same interpreter as that currently executing, potentially overruling the system config? Otherwise I'll merge this.

@daljit46
Copy link
Member

I don't have strong objections against this PR.

I note that a reversion back to using /usr/bin/env seems to be the most appropriate course of action from discussion with @daljit46.

Yes, this is the expectation I have when running a script. I think it's the simplest solution and the most in line with Python's idea of venvs.
However, the author of a script is always in a better position to determine the Python versions (just like in C++ code the project author sets up the C++ standard). So if we want to have a custom logic to choose/impose the Python interpreter, I'm not opposed to it (as long as this is adequately documented).

@Lestropie Lestropie merged commit cd0d42f into dev Aug 15, 2024
6 checks passed
@Lestropie Lestropie deleted the python_shebang_parser branch August 15, 2024 01:46
@Lestropie
Copy link
Member Author

In responding on #2261, I realised there may be an outstanding problem with this kind of solution.

One of the approaches that had been proposed earlier for controlling the Python module search environment was to have executable files where the shebang points to a softlink to Python. That softlink would reside within the MRtrix3 build / installation directory, and just point to a suitable Python version. However by invoking via that softlink, the MRtrix3 API would intrinsically be included within Python's sys.path.

Say hypothetically that a third party software uses a similar approach, but instead of the shebang pointing to a softlink, it instead points to a Python in eg. a custom Conda environment, which contains custom libraries / library versions requisite for that command. Say for instance:

#!/usr/local/fsl/bin/python

With the code in this PR, that would result in use of the current interpreter, which may cause problems for that command.

For the FSL I have installed locally, this isn't the case, they instead use eg.:

#!/usr/bin/env bash
/usr/local/fsl/bin/python -I /usr/local/fsl/bin/eddy_quad "$@"

, which would be fine. But I don't know if this is what FSL has always done, or whether any other third party piece of software could hard-code a custom Python into their shebang.


There's a possible alternative strategy that comes to mind:

  • If the command is an MRtrix3 command, and it is a Python command (which is now known without even having to parse the file, CMake generates a list), then use the same interpreter as that currently running;
  • If the command is a non-MRtrix3 command, make no attempt to impose the currently executing interpreter; just take the shebang and use it.

I don't recall whether there have been any historical edge cases where this would cause problems. But some of them may have had to do with malformed shebangs, which may have been resolved upstream in third-party softwares.

I'll let this simmer for a bit, but my current instinct is that this would be a better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants